Recently, India has experienced an increase in COVID-19 cases and deaths. With insufficient medical resources and poor living conditions, many people have to unfortunately be turned away from hospitals in favor of people with more severe cases. In this notebook, we will attempt to predict the number of deaths in India using machine learning techniques for the next 3 months. This information will be helpful to hospitals and the government to better inform the amount of money and resources that need to be allocated in their respective areas.
https://github.com/datameet/covid19
This data combines multiple data sources from Indian government websites into a cleaner and more accesible format. The government sources for COVID-19 data are the Ministry of Health & Family Welfare and The Indian Council of Medical Research, or ICMR. The data is stored in multiple files representing different pieces of data, such as the total number of cases as a time series, and the number of cases and deaths per state. Each piece of data is stored as a .json file and will need to be preprocessed so that we can work with it more easily.
This JSON file includes a list of states in India along with important geographical information. This will be used later in the project when we visualize various COVID-19 parameters on a map of India.
We will start by importing some of the libraries we will need. The requests library is used to acquire the .json file from the internet and get its contents in a JSON format. The json library is then used to place that data into a local json file and convert it into a dictionary. Finally, we use pandas to convert the dictionary into a dataframe so that we can more easily plot and visualize the data.
import requests
import json
import pandas as pd
Because the data is stored in a specific JSON format, we need to read the JSON file from the web and read it into a JSON data structure. We will start by acquiring the all_totals.json file, which contains the totals for the number of active cases, number of deaths, number of people cured, and the total number of confirmed cases, all with the associated timestamps.
# Takes a JSON file from the data source and places its contents inside a dictionary
def get_json(json_file):
link = 'https://raw.githubusercontent.com/datameet/covid19/master/data/' + json_file
r = requests.get(link)
data = r.json()
# takes the data from link and saves it in a JSON file
with open(json_file, 'w') as f:
json.dump(data, f)
# takes the JSON file and places the contents in a dictionary
with open(json_file) as f:
data = json.load(f)
return data
data = get_json('all_totals.json')
data['rows'][0]
{'key': ['2020-01-30T13:33:00.00+05:30', 'active_cases'], 'value': 1}
As you can see, the data is stored in a key-value pair format, where the key contains the timestamp and attribute name, while the value contains the number associated with that attribute. We can wrangle this data format into a more table-like structure so that we can convert this dictionary into a dataframe.
# Takes input from the JSON dictionary and converts it into
# a new dictionary with a different format. The attributes
# found in the input ("cases", "deaths", etc.) are keys in the
# new dictionary, and the values are a single key-value pair
# of time and the value corresponding to the attribute.
def extract_data(data):
extracted_data = {}
rows = data["rows"]
for row in rows:
time, attribute, value = extract_row(row)
if attribute not in extracted_data:
extracted_data[attribute] = {time : value}
else:
extracted_data[attribute][time] = value
return extracted_data
# Extracts the time, attribute, and value information
# from one row of the JSON dictionary. Used as a helper
# function in extract_data()
def extract_row(row):
time = row['key'][0]
attribute = row['key'][1]
value = row['value']
return time[:10], attribute, value
data_dict = extract_data(data)
df = pd.DataFrame(data_dict)
df
| active_cases | cured | death | total_confirmed_cases | |
|---|---|---|---|---|
| 2020-01-30 | 1 | 0 | 0 | 1 |
| 2020-02-02 | 2 | 0 | 0 | 2 |
| 2020-02-03 | 3 | 0 | 0 | 3 |
| 2020-03-02 | 5 | 0 | 0 | 5 |
| 2020-03-03 | 6 | 0 | 0 | 6 |
| ... | ... | ... | ... | ... |
| 2021-05-09 | 3736648 | 18317404 | 242362 | 22296414 |
| 2021-05-10 | 3745237 | 18671222 | 246116 | 22662575 |
| 2021-05-11 | 3715221 | 19027304 | 249992 | 22992517 |
| 2021-05-12 | 3704099 | 19382642 | 254197 | 23340938 |
| 2021-05-13 | 3710525 | 19734823 | 258317 | 23703665 |
435 rows × 4 columns
There is a problem here with the date ranges, and that is that they are not continuous! If we ever want to visualize the data, it will be important to have a continuous date range. We can do this by adding new rows for the missing days, and simply taking the previous row's values as the values for the new rows. We fill in the missing values in this way because all of the metrics in the dataset are cumulative.
# calculate date range index using lowest and highest dates
idx = pd.date_range(df.index.min(), df.index.max())
df.index = pd.DatetimeIndex(df.index)
# reindex the data using date range, and fill any missing dates
# with the previous date's row values
df = df.reindex(index=idx, method='ffill')
df
| active_cases | cured | death | total_confirmed_cases | |
|---|---|---|---|---|
| 2020-01-30 | 1 | 0 | 0 | 1 |
| 2020-01-31 | 1 | 0 | 0 | 1 |
| 2020-02-01 | 1 | 0 | 0 | 1 |
| 2020-02-02 | 2 | 0 | 0 | 2 |
| 2020-02-03 | 3 | 0 | 0 | 3 |
| ... | ... | ... | ... | ... |
| 2021-05-09 | 3736648 | 18317404 | 242362 | 22296414 |
| 2021-05-10 | 3745237 | 18671222 | 246116 | 22662575 |
| 2021-05-11 | 3715221 | 19027304 | 249992 | 22992517 |
| 2021-05-12 | 3704099 | 19382642 | 254197 | 23340938 |
| 2021-05-13 | 3710525 | 19734823 | 258317 | 23703665 |
470 rows × 4 columns
Now that we have the data in a managable format, we can start visualizing the data. Let's start by plotting the number of active cases, total number of confirmed cases, number of deaths, and number of people cured.
df.plot(y='active_cases')
<AxesSubplot:>
df.plot(y='total_confirmed_cases')
<AxesSubplot:>
df.plot(y='death')
<AxesSubplot:>
df.plot(y='cured')
<AxesSubplot:>
We can very clearly see from these plots that cases and deaths have been skyrocketing starting around late March to early April 2021. This is when a new strain of the virus came to India. But what made this new strain so difficult to handle compared to any previous ones, and is the situation different by state in India? Let us try to explore this question by looking at the COVID-19 data partitioned by state. This can be found in the Ministry of Health's data, which can be located in mohfw.json.
data2 = get_json('mohfw.json')
data2['rows'][0]
{'id': '2020-01-30T13:33:00.00+05:30|kl',
'key': '2020-01-30T13:33:00.00+05:30|kl',
'value': {'_id': '2020-01-30T13:33:00.00+05:30|kl',
'_rev': '2-727ff11254cdb1c043ab19b2714b3cab',
'report_time': '2020-01-30T13:33:00.00+05:30',
'state': 'kl',
'confirmed_india': 1,
'confirmed_foreign': 0,
'cured': 0,
'death': 0,
'source': 'mohfw_pib',
'type': 'cases',
'confirmed': 1}}
It seems that this JSON file's format is slightly different than the all_totals's format. We can see that the rows of the data are essentially a set of key-value pairings. Let's further inspect the contents of the JSON.
data2['rows'][5].keys()
dict_keys(['id', 'key', 'value'])
data2['rows'][5]['value'].keys()
dict_keys(['_id', '_rev', 'report_time', 'state', 'confirmed_india', 'confirmed_foreign', 'cured', 'death', 'source', 'type', 'confirmed'])
# Adds two tuples together. This is useful for adding all of the cases
# for each state that were recorded as various timestamps.
def add_tuples(a, b):
return tuple(map(lambda i, j: i + j, a, b))
# Given a state abbrevation, this function will map it to
# the corresponding state name.
def get_state_from_abbrev(abbrev):
# Dictionary mapping state abbreviations to their full names
states = {
"ap": "Andhra Pradesh",
"ar": "Arunachal Pradesh",
"as": "Assam",
"br": "Bihar",
"ct": "Chhattisgarh",
"ga": "Goa",
"gj": "Gujarat",
"hr": "Haryana",
"hp": "Himachal Pradesh",
"jh": "Jharkhand",
"ka": "Karnataka",
"kl": "Kerala",
"mp": "Madhya Pradesh",
"mh": "Maharashtra",
"mn": "Manipur",
"ml": "Meghalaya",
"mz": "Mizoram",
"nl": "Nagaland",
"or": "Odisha",
"pb": "Punjab",
"rj": "Rajasthan",
"sk": "Sikkim",
"tn": "Tamil Nadu",
"tg": "Telangana",
"tr": "Tripura",
"ut": "Uttarakhand",
"up": "Uttar Pradesh",
"wb": "West Bengal",
"an": "Andaman and Nicobar Islands",
"ch": "Chandigarh",
"dn": "Dadra and Nagar Haveli",
"dd": "Daman and Diu",
"dl": "Delhi",
"jk": "Jammu and Kashmir",
"la": "Ladakh",
"ld": "Lakshadweep",
"py": "Pondicherry",
"dn_dd": "Dadra and Nagar Haveli and Daman and Diu",
"unassigned": "unassigned"
}
return states[abbrev]
We now want to extract the relevant data from the dictionary that is needed to display on the maps. Right now, the data is structured as time series data. For the maps, however, we are not concerned with the time cases or deaths occurred, but rather the current numbers for those statistics. Later, when we create a machine learning model for predicting the number of deaths, we will come back to the time series data.
# Iterates over the rows of the dictionary data and extracts the relevant
# information from the rows. The dictionary is a key-value pairing between
# state and a tuple of confirmed, cured, and death.
def extract_data2(data):
extracted_data = {}
rows = data["rows"]
for row in rows:
time, state, confirmed, cured, death = extract_row2(row)
if state not in extracted_data:
extracted_data[state] = (confirmed, cured, death)
else:
res = add_tuples(extracted_data[state], (confirmed, cured, death))
extracted_data[state] = res
return extracted_data
# Extracts the time, state, confirmed, cured, and death information
# from one row of the JSON dictionary. Used as a helper
# function in extract_data2()
def extract_row2(row):
values_dict = row['value']
time = values_dict['report_time']
state = get_state_from_abbrev(values_dict['state'])
confirmed = values_dict['confirmed']
cured = values_dict['cured']
death = values_dict['death']
return time[:10], state, confirmed, cured, death
Let's now create the reformatted dictionary and see what value is stored at a particular key, say, Kerala.
extracted2 = extract_data2(data2)
extracted2['Kerala']
(200185603, 177894118, 766581)
Now that we have the dictionary properly formatted, it's time to convert it into a pandas dataframe. Pandas allows us to feed in a dictionary into the dataframe constructor, so we can simply pass extracted2 into that. For more details on how this works, see the documentation at https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html.
# Creates dataframe from dictionary
df2 = pd.DataFrame(extracted2)
df2 = df2.transpose() # flips the rows and columns
df2['state'] = df2.index
df2.columns = ['cases', 'cured', 'deaths', 'state']
df2.drop('unassigned', inplace=True)
df2
| cases | cured | deaths | state | |
|---|---|---|---|---|
| Kerala | 200185603 | 177894118 | 766581 | Kerala |
| Delhi | 158725093 | 147896560 | 2741459 | Delhi |
| Telangana | 75798728 | 69580382 | 432535 | Telangana |
| Rajasthan | 77767638 | 69179429 | 700476 | Rajasthan |
| Haryana | 66067411 | 60344316 | 698864 | Haryana |
| Jammu and Kashmir | 30756599 | 27720245 | 469131 | Jammu and Kashmir |
| Karnataka | 242380060 | 216792332 | 3144790 | Karnataka |
| Ladakh | 2307394 | 2084230 | 28077 | Ladakh |
| Maharashtra | 589598167 | 512544984 | 13590012 | Maharashtra |
| Punjab | 47764245 | 42219368 | 1406438 | Punjab |
| Tamil Nadu | 223058512 | 208216246 | 3261644 | Tamil Nadu |
| Uttar Pradesh | 160155913 | 143267909 | 2199456 | Uttar Pradesh |
| Andhra Pradesh | 228887306 | 215214582 | 1848944 | Andhra Pradesh |
| Uttarakhand | 23199910 | 20397560 | 370919 | Uttarakhand |
| Odisha | 82390133 | 77558828 | 426009 | Odisha |
| West Bengal | 133436340 | 123072064 | 2330712 | West Bengal |
| Pondicherry | 9996027 | 9076464 | 163604 | Pondicherry |
| Chandigarh | 5394798 | 4840252 | 77743 | Chandigarh |
| Chhattisgarh | 75304232 | 66298558 | 875894 | Chhattisgarh |
| Gujarat | 70290062 | 62010591 | 1330100 | Gujarat |
| Himachal Pradesh | 12567361 | 11065632 | 198464 | Himachal Pradesh |
| Madhya Pradesh | 66056188 | 59307071 | 976388 | Madhya Pradesh |
| Bihar | 69575592 | 64292363 | 381133 | Bihar |
| Manipur | 6347465 | 5796243 | 72359 | Manipur |
| Mizoram | 1009532 | 910925 | 1770 | Mizoram |
| Goa | 13652667 | 12301268 | 188057 | Goa |
| Andaman and Nicobar Islands | 1283574 | 1211641 | 16203 | Andaman and Nicobar Islands |
| Assam | 56413792 | 52528287 | 260216 | Assam |
| Jharkhand | 31400674 | 28349151 | 296292 | Jharkhand |
| Arunachal Pradesh | 3962184 | 3686314 | 12013 | Arunachal Pradesh |
| Tripura | 8320110 | 7728375 | 92857 | Tripura |
| Nagaland | 2846345 | 2593039 | 17581 | Nagaland |
| Meghalaya | 3074017 | 2797072 | 30784 | Meghalaya |
| Dadra and Nagar Haveli | 186 | 14 | 0 | Dadra and Nagar Haveli |
| Sikkim | 1390467 | 1243984 | 26879 | Sikkim |
| Daman and Diu | 2 | 0 | 0 | Daman and Diu |
| Dadra and Nagar Haveli and Daman and Diu | 1022337 | 940814 | 662 | Dadra and Nagar Haveli and Daman and Diu |
| Lakshadweep | 107535 | 73104 | 164 | Lakshadweep |
We can see there are some missing values (a lot actually!) that are labelled as 'unassigned.' For now we will drop these values from the table, although further analysis might be required to truly understand the effect of these values on the data overall.
Now we can proceed by trying to visualize the data on a map. The idea is to create three maps where we color states by their relative number of cases/cured/deaths. The first step is to scale down the data by a factor of a million. This will make the scales in our maps more readable.
# Scaling down the cases, cured, and deaths columns
df2_copy = df2.copy()
df2_copy['cases'] /= 1e6
df2_copy['cured'] /= 1e6
df2_copy['deaths'] /= 1e6
df2_copy
| cases | cured | deaths | state | |
|---|---|---|---|---|
| Kerala | 200.185603 | 177.894118 | 0.766581 | Kerala |
| Delhi | 158.725093 | 147.896560 | 2.741459 | Delhi |
| Telangana | 75.798728 | 69.580382 | 0.432535 | Telangana |
| Rajasthan | 77.767638 | 69.179429 | 0.700476 | Rajasthan |
| Haryana | 66.067411 | 60.344316 | 0.698864 | Haryana |
| Jammu and Kashmir | 30.756599 | 27.720245 | 0.469131 | Jammu and Kashmir |
| Karnataka | 242.380060 | 216.792332 | 3.144790 | Karnataka |
| Ladakh | 2.307394 | 2.084230 | 0.028077 | Ladakh |
| Maharashtra | 589.598167 | 512.544984 | 13.590012 | Maharashtra |
| Punjab | 47.764245 | 42.219368 | 1.406438 | Punjab |
| Tamil Nadu | 223.058512 | 208.216246 | 3.261644 | Tamil Nadu |
| Uttar Pradesh | 160.155913 | 143.267909 | 2.199456 | Uttar Pradesh |
| Andhra Pradesh | 228.887306 | 215.214582 | 1.848944 | Andhra Pradesh |
| Uttarakhand | 23.199910 | 20.397560 | 0.370919 | Uttarakhand |
| Odisha | 82.390133 | 77.558828 | 0.426009 | Odisha |
| West Bengal | 133.436340 | 123.072064 | 2.330712 | West Bengal |
| Pondicherry | 9.996027 | 9.076464 | 0.163604 | Pondicherry |
| Chandigarh | 5.394798 | 4.840252 | 0.077743 | Chandigarh |
| Chhattisgarh | 75.304232 | 66.298558 | 0.875894 | Chhattisgarh |
| Gujarat | 70.290062 | 62.010591 | 1.330100 | Gujarat |
| Himachal Pradesh | 12.567361 | 11.065632 | 0.198464 | Himachal Pradesh |
| Madhya Pradesh | 66.056188 | 59.307071 | 0.976388 | Madhya Pradesh |
| Bihar | 69.575592 | 64.292363 | 0.381133 | Bihar |
| Manipur | 6.347465 | 5.796243 | 0.072359 | Manipur |
| Mizoram | 1.009532 | 0.910925 | 0.001770 | Mizoram |
| Goa | 13.652667 | 12.301268 | 0.188057 | Goa |
| Andaman and Nicobar Islands | 1.283574 | 1.211641 | 0.016203 | Andaman and Nicobar Islands |
| Assam | 56.413792 | 52.528287 | 0.260216 | Assam |
| Jharkhand | 31.400674 | 28.349151 | 0.296292 | Jharkhand |
| Arunachal Pradesh | 3.962184 | 3.686314 | 0.012013 | Arunachal Pradesh |
| Tripura | 8.320110 | 7.728375 | 0.092857 | Tripura |
| Nagaland | 2.846345 | 2.593039 | 0.017581 | Nagaland |
| Meghalaya | 3.074017 | 2.797072 | 0.030784 | Meghalaya |
| Dadra and Nagar Haveli | 0.000186 | 0.000014 | 0.000000 | Dadra and Nagar Haveli |
| Sikkim | 1.390467 | 1.243984 | 0.026879 | Sikkim |
| Daman and Diu | 0.000002 | 0.000000 | 0.000000 | Daman and Diu |
| Dadra and Nagar Haveli and Daman and Diu | 1.022337 | 0.940814 | 0.000662 | Dadra and Nagar Haveli and Daman and Diu |
| Lakshadweep | 0.107535 | 0.073104 | 0.000164 | Lakshadweep |
We will use the folium library, which is a wrapper around the leaflet.js library, to display the maps. Note that these types of visualizations are called choropeths.
Choropeths Tutorial: https://vverde.github.io/blob/interactivechoropleth.html
Choropeths Documentation: https://python-visualization.github.io/folium/quickstart.html#Choropleth-maps
import folium
# Creates a map based on India's lat/long coordinates
map_osm = folium.Map(location=[22.71, 79.04], zoom_start=5)
map_osm
# This function plots a given column of the dataframe onto the map of India.
# - scheme:
# - Color range that will be used to color the different states.
# The scheme is specified by a string, and the available colors can be found at
# https://github.com/python-visualization/folium/issues/403.
# - scale:
# - List of numbers that will decide the shade of color for a particular state.
# The scale can also be seen visually when the choropleth is displayed.
def color_map(column, scheme, scale):
map_osm = folium.Map(location=[22.71, 79.04], zoom_start=5)
map_osm
folium.Choropleth(
# geo_data specifies the state regions that need to be colored.
geo_data="https://gist.githubusercontent.com/jbrobst/56c13bbbf9d97d187fea01ca62ea5112/raw/e388c4cae20aa53cb5090210a42ebb9b765c0a36/india_states.geojson",
name="choropleth",
data=df2_copy,
columns=["state", column], # column of data we are interested in plotting
key_on='properties.ST_NM',
fill_color=scheme,
fill_opacity=0.7,
line_opacity=0.2,
legend_name=f"COVID-19 {column} (in millions)",
threshold_scale=scale
).add_to(map_osm)
folium.LayerControl().add_to(map_osm)
return map_osm
color_map("deaths", "Reds", [0, 2.5, 5, 7.5, 10, 12.5, 15])
color_map("cured", "Greens", scale=[0, 100, 200, 300, 400, 500, 600])
color_map("cases", "Purples", scale=[0, 100, 200, 300, 400, 500, 600])
From these plots, we can seen that things seem pretty dire in Maharashta, the state where Mumbai is located. Maharashta has the greatest number of deaths, cases, and cured people. This makes a lot of sense because Mumbai, India's largest city, is located in Maharashta, and so there are some highly dense city areas there. In a city, especially a dense one, it is more difficult to socially distance, making it easier for COVID-19 to spread.
Looking at the other states, it appears that states in Southern India have it worse in number of cases, deaths, and cured people compared to Northern India. Kerala, which is in Southern India, is an interesting case study, because while it shares similar numbers in terms of cases and number of people cured, it has significantly fewer number of deaths than its neighboring states, such as Tamil Nadu and Karnataka.
We can now create a model for India's COVID-19 cases to predict how many deaths there will be in the next 3 months. We can do this by accessing the data from data2 that we previously had.
data2['rows'][0]['value']['death']
0
deaths = {}
for record in data2['rows']:
time = record['value']['report_time']
death = record['value']['death']
if time in deaths:
deaths[time] += death
else:
deaths[time] = death
deaths
{'2020-01-30T13:33:00.00+05:30': 0,
'2020-02-02T10:39:00.00+05:30': 0,
'2020-02-03T12:13:00.00+05:30': 0,
'2020-03-02T14:28:00.00+05:30': 0,
'2020-03-03T19:36:00.00+05:30': 0,
'2020-03-10T12:00:00.00+05:30': 0,
'2020-03-11T17:30:00.00+05:30': 0,
'2020-03-12T11:00:00.00+05:30': 0,
'2020-03-12T18:00:00.00+05:30': 0,
'2020-03-13T10:00:00.00+05:30': 1,
'2020-03-13T22:15:00.00+05:30': 2,
'2020-03-14T09:00:00.00+05:30': 2,
'2020-03-14T16:55:00.00+05:30': 2,
'2020-03-15T08:55:00.00+05:30': 2,
'2020-03-15T12:00:00.00+05:30': 2,
'2020-03-15T17:00:00.00+05:30': 2,
'2020-03-15T23:30:00.00+05:30': 2,
'2020-03-16T16:00:00.00+05:30': 2,
'2020-03-17T09:15:00.00+05:30': 2,
'2020-03-17T10:55:00.00+05:30': 3,
'2020-03-17T11:52:00.00+05:30': 3,
'2020-03-17T17:15:00.00+05:30': 3,
'2020-03-18T09:00:00.00+05:30': 3,
'2020-03-18T17:15:00.00+05:30': 3,
'2020-03-19T09:00:00.00+05:30': 3,
'2020-03-19T16:30:00.00+05:30': 4,
'2020-03-19T17:00:00.00+05:30': 4,
'2020-03-20T09:00:00.00+05:30': 4,
'2020-03-20T17:00:00.00+05:30': 4,
'2020-03-21T09:00:00.00+05:30': 4,
'2020-03-21T16:45:00.00+05:30': 4,
'2020-03-22T09:45:00.00+05:30': 4,
'2020-03-22T11:45:00.00+05:30': 5,
'2020-03-22T18:30:00.00+05:30': 7,
'2020-03-23T09:00:00.00+05:30': 7,
'2020-03-23T10:30:00.00+05:30': 7,
'2020-03-23T18:00:00.00+05:30': 7,
'2020-03-23T20:15:00.00+05:30': 9,
'2020-03-24T08:45:00.00+05:30': 9,
'2020-03-24T17:45:00.00+05:30': 9,
'2020-03-24T20:15:00.00+05:30': 10,
'2020-03-25T09:15:00.00+05:30': 9,
'2020-03-25T18:45:00.00+05:30': 10,
'2020-03-26T10:15:00.00+05:30': 13,
'2020-03-26T20:00:00.00+05:30': 15,
'2020-03-27T09:15:00.00+05:30': 17,
'2020-03-28T09:30:00.00+05:30': 19,
'2020-03-28T17:45:00.00+05:30': 19,
'2020-03-29T10:00:00.00+05:30': 25,
'2020-03-29T19:30:00.00+05:30': 27,
'2020-03-30T10:30:00.00+05:30': 29,
'2020-03-30T21:30:00.00+05:30': 31,
'2020-03-31T20:30:00.00+05:30': 35,
'2020-04-01T09:00:00.00+05:30': 38,
'2020-04-01T19:30:00.00+05:30': 41,
'2020-04-02T09:00:00.00+05:30': 50,
'2020-04-02T18:00:00.00+05:30': 53,
'2020-04-03T09:00:00.00+05:30': 56,
'2020-04-03T18:00:00.00+05:30': 62,
'2020-04-04T09:00:00.00+05:30': 68,
'2020-04-04T18:00:00.00+05:30': 75,
'2020-04-05T09:00:00.00+05:30': 77,
'2020-04-05T18:00:00.00+05:30': 83,
'2020-04-06T09:00:00.00+05:30': 109,
'2020-04-06T18:00:00.00+05:30': 111,
'2020-04-07T09:00:00.00+05:30': 114,
'2020-04-07T18:00:00.00+05:30': 124,
'2020-04-08T08:00:00.00+05:30': 149,
'2020-04-08T17:00:00.00+05:30': 149,
'2020-04-09T08:00:00.00+05:30': 166,
'2020-04-09T17:00:00.00+05:30': 169,
'2020-04-10T08:00:00.00+05:30': 199,
'2020-04-10T17:00:00.00+05:30': 206,
'2020-04-11T08:00:00.00+05:30': 239,
'2020-04-11T17:00:00.00+05:30': 242,
'2020-04-12T08:00:00.00+05:30': 273,
'2020-04-12T17:00:00.00+05:30': 273,
'2020-04-13T08:00:00.00+05:30': 308,
'2020-04-13T17:00:00.00+05:30': 324,
'2020-04-14T08:00:00.00+05:30': 339,
'2020-04-14T17:00:00.00+05:30': 353,
'2020-04-15T08:00:00.00+05:30': 377,
'2020-04-15T17:00:00.00+05:30': 392,
'2020-04-16T08:00:00.00+05:30': 414,
'2020-04-16T17:00:00.00+05:30': 420,
'2020-04-17T08:00:00.00+05:30': 437,
'2020-04-17T17:00:00.00+05:30': 452,
'2020-04-18T08:00:00.00+05:30': 480,
'2020-04-18T17:00:00.00+05:30': 488,
'2020-04-19T08:00:00.00+05:30': 507,
'2020-04-19T17:00:00.00+05:30': 519,
'2020-04-20T08:00:00.00+05:30': 543,
'2020-04-20T17:00:00.00+05:30': 559,
'2020-04-21T08:00:00.00+05:30': 590,
'2020-04-21T17:00:00.00+05:30': 603,
'2020-04-22T08:00:00.00+05:30': 640,
'2020-04-22T17:00:00.00+05:30': 652,
'2020-04-23T08:00:00.00+05:30': 681,
'2020-04-23T17:00:00.00+05:30': 686,
'2020-04-24T08:00:00.00+05:30': 718,
'2020-04-24T17:00:00.00+05:30': 723,
'2020-04-25T08:00:00.00+05:30': 775,
'2020-04-25T17:00:00.00+05:30': 779,
'2020-04-26T08:00:00.00+05:30': 824,
'2020-04-26T17:00:00.00+05:30': 826,
'2020-04-27T08:00:00.00+05:30': 872,
'2020-04-27T17:00:00.00+05:30': 886,
'2020-04-28T08:00:00.00+05:30': 934,
'2020-04-28T17:00:00.00+05:30': 937,
'2020-04-29T08:00:00.00+05:30': 1007,
'2020-04-29T17:00:00.00+05:30': 1008,
'2020-04-30T08:00:00.00+05:30': 1074,
'2020-04-30T17:00:00.00+05:30': 1075,
'2020-05-01T08:00:00.00+05:30': 1147,
'2020-05-01T17:00:00.00+05:30': 1152,
'2020-05-02T08:00:00.00+05:30': 1218,
'2020-05-02T17:00:00.00+05:30': 1223,
'2020-05-03T08:00:00.00+05:30': 1301,
'2020-05-03T17:00:00.00+05:30': 1306,
'2020-05-04T08:00:00.00+05:30': 1373,
'2020-05-04T17:00:00.00+05:30': 1389,
'2020-05-05T08:00:00.00+05:30': 1568,
'2020-05-05T17:00:00.00+05:30': 1583,
'2020-05-06T08:00:00.00+05:30': 1694,
'2020-05-07T08:00:00.00+05:30': 1783,
'2020-05-08T08:00:00.00+05:30': 1886,
'2020-05-09T08:00:00.00+05:30': 1981,
'2020-05-10T08:00:00.00+05:30': 2109,
'2020-05-11T08:00:00.00+05:30': 2206,
'2020-05-12T08:00:00.00+05:30': 2293,
'2020-05-13T08:00:00.00+05:30': 2415,
'2020-05-14T08:00:00.00+05:30': 2549,
'2020-05-15T08:00:00.00+05:30': 2649,
'2020-05-16T08:00:00.00+05:30': 2752,
'2020-05-17T08:00:00.00+05:30': 2872,
'2020-05-18T08:00:00.00+05:30': 3029,
'2020-05-19T08:00:00.00+05:30': 3163,
'2020-05-20T08:00:00.00+05:30': 3303,
'2020-05-21T08:00:00.00+05:30': 3435,
'2020-05-22T08:00:00.00+05:30': 3583,
'2020-05-23T08:00:00.00+05:30': 3720,
'2020-05-24T08:00:00.00+05:30': 3867,
'2020-05-25T08:00:00.00+05:30': 4021,
'2020-05-26T08:00:00.00+05:30': 4167,
'2020-05-27T08:00:00.00+05:30': 4337,
'2020-05-28T08:00:00.00+05:30': 4531,
'2020-05-29T08:00:00.00+05:30': 4706,
'2020-05-30T08:00:00.00+05:30': 4971,
'2020-05-31T08:00:00.00+05:30': 5164,
'2020-06-01T08:00:00.00+05:30': 5394,
'2020-06-02T08:00:00.00+05:30': 5598,
'2020-06-03T08:00:00.00+05:30': 5815,
'2020-06-04T08:00:00.00+05:30': 6075,
'2020-06-05T08:00:00.00+05:30': 6348,
'2020-06-06T08:00:00.00+05:30': 6642,
'2020-06-07T08:00:00.00+05:30': 6929,
'2020-06-08T08:00:00.00+05:30': 7135,
'2020-06-09T08:00:00.00+05:30': 7466,
'2020-06-10T08:00:00.00+05:30': 7745,
'2020-06-11T08:00:00.00+05:30': 8102,
'2020-06-12T08:00:00.00+05:30': 8498,
'2020-06-13T08:00:00.00+05:30': 8884,
'2020-06-14T08:00:00.00+05:30': 9195,
'2020-06-15T08:00:00.00+05:30': 9520,
'2020-06-16T08:00:00.00+05:30': 9900,
'2020-06-17T08:00:00.00+05:30': 11903,
'2020-06-18T08:00:00.00+05:30': 12237,
'2020-06-19T08:00:00.00+05:30': 12573,
'2020-06-20T08:00:00.00+05:30': 12948,
'2020-06-21T08:00:00.00+05:30': 13254,
'2020-06-22T08:00:00.00+05:30': 13699,
'2020-06-23T08:00:00.00+05:30': 14011,
'2020-06-24T08:00:00.00+05:30': 14476,
'2020-06-25T08:00:00.00+05:30': 14894,
'2020-06-26T08:00:00.00+05:30': 15301,
'2020-06-27T08:00:00.00+05:30': 15685,
'2020-06-28T08:00:00.00+05:30': 16095,
'2020-06-29T08:00:00.00+05:30': 16475,
'2020-06-30T08:00:00.00+05:30': 16893,
'2020-07-01T08:00:00.00+05:30': 17400,
'2020-07-02T08:00:00.00+05:30': 17834,
'2020-07-03T08:00:00.00+05:30': 18213,
'2020-07-04T08:00:00.00+05:30': 18655,
'2020-07-05T08:00:00.00+05:30': 19268,
'2020-07-06T08:00:00.00+05:30': 19693,
'2020-07-07T08:00:00.00+05:30': 20160,
'2020-07-08T08:00:00.00+05:30': 20642,
'2020-07-09T08:00:00.00+05:30': 21129,
'2020-07-10T08:00:00.00+05:30': 21604,
'2020-07-11T08:00:00.00+05:30': 22123,
'2020-07-12T08:00:00.00+05:30': 22674,
'2020-07-13T08:00:00.00+05:30': 23174,
'2020-07-14T08:00:00.00+05:30': 23727,
'2020-07-15T08:00:00.00+05:30': 24309,
'2020-07-16T08:00:00.00+05:30': 24915,
'2020-07-17T08:00:00.00+05:30': 25602,
'2020-07-18T08:00:00.00+05:30': 26273,
'2020-07-19T08:00:00.00+05:30': 26816,
'2020-07-20T08:00:00.00+05:30': 27497,
'2020-07-21T08:00:00.00+05:30': 28084,
'2020-07-22T08:00:00.00+05:30': 28732,
'2020-07-23T08:00:00.00+05:30': 29861,
'2020-07-24T08:00:00.00+05:30': 30601,
'2020-07-25T08:00:00.00+05:30': 31358,
'2020-07-26T08:00:00.00+05:30': 32063,
'2020-07-27T08:00:00.00+05:30': 32771,
'2020-07-28T08:00:00.00+05:30': 33425,
'2020-07-29T08:00:00.00+05:30': 34193,
'2020-07-30T08:00:00.00+05:30': 34968,
'2020-07-31T08:00:00.00+05:30': 35747,
'2020-08-01T08:00:00.00+05:30': 36511,
'2020-08-02T08:00:00.00+05:30': 37364,
'2020-08-03T08:00:00.00+05:30': 38135,
'2020-08-04T08:00:00.00+05:30': 38938,
'2020-08-05T08:00:00.00+05:30': 39795,
'2020-08-06T08:00:00.00+05:30': 40699,
'2020-08-07T08:00:00.00+05:30': 41585,
'2020-08-08T08:00:00.00+05:30': 42518,
'2020-08-09T08:00:00.00+05:30': 43379,
'2020-08-10T08:00:00.00+05:30': 44386,
'2020-08-11T08:00:00.00+05:30': 45257,
'2020-08-12T08:00:00.00+05:30': 46091,
'2020-08-13T08:00:00.00+05:30': 47033,
'2020-08-14T08:00:00.00+05:30': 48040,
'2020-08-15T08:00:00.00+05:30': 49036,
'2020-08-16T08:00:00.00+05:30': 49980,
'2020-08-17T08:00:00.00+05:30': 50921,
'2020-08-18T08:00:00.00+05:30': 51797,
'2020-08-19T08:00:00.00+05:30': 52889,
'2020-08-20T08:00:00.00+05:30': 53866,
'2020-08-21T08:00:00.00+05:30': 54849,
'2020-08-22T08:00:00.00+05:30': 55794,
'2020-08-23T08:00:00.00+05:30': 56706,
'2020-08-24T08:00:00.00+05:30': 57542,
'2020-08-25T08:00:00.00+05:30': 58390,
'2020-08-26T08:00:00.00+05:30': 59449,
'2020-08-27T08:00:00.00+05:30': 60472,
'2020-08-28T08:00:00.00+05:30': 61529,
'2020-08-29T08:00:00.00+05:30': 62550,
'2020-08-30T08:00:00.00+05:30': 63498,
'2020-08-31T08:00:00.00+05:30': 64469,
'2020-09-01T08:00:00.00+05:30': 65288,
'2020-09-02T08:00:00.00+05:30': 66333,
'2020-09-03T08:00:00.00+05:30': 67376,
'2020-09-04T08:00:00.00+05:30': 68472,
'2020-09-05T08:00:00.00+05:30': 69561,
'2020-09-06T08:00:00.00+05:30': 70626,
'2020-09-07T08:00:00.00+05:30': 71642,
'2020-09-08T08:00:00.00+05:30': 72775,
'2020-09-09T08:00:00.00+05:30': 73890,
'2020-09-10T08:00:00.00+05:30': 75062,
'2020-09-11T08:00:00.00+05:30': 76271,
'2020-09-12T08:00:00.00+05:30': 77472,
'2020-09-13T08:00:00.00+05:30': 78586,
'2020-09-14T08:00:00.00+05:30': 79722,
'2020-09-15T08:00:00.00+05:30': 80776,
'2020-09-16T08:00:00.00+05:30': 82066,
'2020-09-17T08:00:00.00+05:30': 83198,
'2020-09-18T08:00:00.00+05:30': 84372,
'2020-09-19T08:00:00.00+05:30': 85619,
'2020-09-20T08:00:00.00+05:30': 86752,
'2020-09-21T08:00:00.00+05:30': 87882,
'2020-09-22T08:00:00.00+05:30': 88935,
'2020-09-23T08:00:00.00+05:30': 90020,
'2020-09-24T08:00:00.00+05:30': 91149,
'2020-09-25T08:00:00.00+05:30': 92290,
'2020-09-26T08:00:00.00+05:30': 93379,
'2020-09-27T08:00:00.00+05:30': 94503,
'2020-09-28T08:00:00.00+05:30': 95542,
'2020-09-29T08:00:00.00+05:30': 96318,
'2020-09-30T08:00:00.00+05:30': 97497,
'2020-10-01T08:00:00.00+05:30': 98678,
'2020-10-02T08:00:00.00+05:30': 99773,
'2020-10-03T08:00:00.00+05:30': 100842,
'2020-10-04T08:00:00.00+05:30': 101782,
'2020-10-05T08:00:00.00+05:30': 102685,
'2020-10-06T08:00:00.00+05:30': 103569,
'2020-10-07T08:00:00.00+05:30': 104555,
'2020-10-08T08:00:00.00+05:30': 105526,
'2020-10-09T08:00:00.00+05:30': 106490,
'2020-10-10T08:00:00.00+05:30': 107416,
'2020-10-11T08:00:00.00+05:30': 108334,
'2020-10-12T08:00:00.00+05:30': 109150,
'2020-10-13T08:00:00.00+05:30': 109856,
'2020-10-14T08:00:00.00+05:30': 110586,
'2020-10-15T08:00:00.00+05:30': 111266,
'2020-10-16T08:00:00.00+05:30': 112161,
'2020-10-17T08:00:00.00+05:30': 112998,
'2020-10-18T08:00:00.00+05:30': 114031,
'2020-10-19T08:00:00.00+05:30': 114610,
'2020-10-20T08:00:00.00+05:30': 115197,
'2020-10-21T08:00:00.00+05:30': 115914,
'2020-10-22T08:00:00.00+05:30': 116616,
'2020-10-23T08:00:00.00+05:30': 117306,
'2020-10-24T08:00:00.00+05:30': 117956,
'2020-10-25T08:00:00.00+05:30': 118534,
'2020-10-26T08:00:00.00+05:30': 119014,
'2020-10-27T08:00:00.00+05:30': 119502,
'2020-10-28T08:00:00.00+05:30': 120010,
'2020-10-29T08:00:00.00+05:30': 120527,
'2020-10-30T08:00:00.00+05:30': 121090,
'2020-10-31T08:00:00.00+05:30': 121641,
'2020-11-01T08:00:00.00+05:30': 122111,
'2020-11-02T08:00:00.00+05:30': 122607,
'2020-11-03T08:00:00.00+05:30': 123097,
'2020-11-04T08:00:00.00+05:30': 123611,
'2020-11-05T08:00:00.00+05:30': 124315,
'2020-11-06T08:00:00.00+05:30': 124985,
'2020-11-07T08:00:00.00+05:30': 125562,
'2020-11-08T08:00:00.00+05:30': 126121,
'2020-11-09T08:00:00.00+05:30': 126611,
'2020-11-10T08:00:00.00+05:30': 127059,
'2020-11-11T08:00:00.00+05:30': 127571,
'2020-11-12T08:00:00.00+05:30': 128121,
'2020-11-13T08:00:00.00+05:30': 128668,
'2020-11-14T08:00:00.00+05:30': 129188,
'2020-11-15T08:00:00.00+05:30': 129635,
'2020-11-16T08:00:00.00+05:30': 130070,
'2020-11-17T08:00:00.00+05:30': 130519,
'2020-11-18T08:00:00.00+05:30': 130993,
'2020-11-19T08:00:00.00+05:30': 131578,
'2020-11-20T08:00:00.00+05:30': 132162,
'2020-11-21T08:00:00.00+05:30': 132726,
'2020-11-22T08:00:00.00+05:30': 133227,
'2020-11-23T08:00:00.00+05:30': 133738,
'2020-11-24T08:00:00.00+05:30': 134218,
'2020-11-25T08:00:00.00+05:30': 134699,
'2020-11-26T08:00:00.00+05:30': 135223,
'2020-11-27T08:00:00.00+05:30': 135715,
'2020-11-28T08:00:00.00+05:30': 136200,
'2020-11-29T08:00:00.00+05:30': 136696,
'2020-11-30T08:00:00.00+05:30': 137139,
'2020-12-01T08:00:00.00+05:30': 137621,
'2020-12-02T08:00:00.00+05:30': 138122,
'2020-12-03T08:00:00.00+05:30': 138648,
'2020-12-04T08:00:00.00+05:30': 139188,
'2020-12-05T08:00:00.00+05:30': 139700,
'2020-12-06T08:00:00.00+05:30': 140182,
'2020-12-07T08:00:00.00+05:30': 140573,
'2020-12-08T08:00:00.00+05:30': 140958,
'2020-12-09T08:00:00.00+05:30': 141360,
'2020-12-10T08:00:00.00+05:30': 141772,
'2020-12-11T08:00:00.00+05:30': 142186,
'2020-12-12T08:00:00.00+05:30': 142628,
'2020-12-13T08:00:00.00+05:30': 143019,
'2020-12-14T08:00:00.00+05:30': 143355,
'2020-12-15T08:00:00.00+05:30': 143709,
'2020-12-16T08:00:00.00+05:30': 144096,
'2020-12-17T08:00:00.00+05:30': 144451,
'2020-12-18T08:00:00.00+05:30': 144789,
'2020-12-19T08:00:00.00+05:30': 145136,
'2020-12-20T08:00:00.00+05:30': 145477,
'2020-12-21T08:00:00.00+05:30': 145810,
'2020-12-22T08:00:00.00+05:30': 146111,
'2020-12-23T08:00:00.00+05:30': 146444,
'2020-12-24T08:00:00.00+05:30': 146756,
'2020-12-25T08:00:00.00+05:30': 147092,
'2020-12-26T08:00:00.00+05:30': 147343,
'2020-12-27T08:00:00.00+05:30': 147622,
'2020-12-28T08:00:00.00+05:30': 147901,
'2020-12-29T08:00:00.00+05:30': 148153,
'2020-12-30T08:00:00.00+05:30': 148439,
'2020-12-31T08:00:00.00+05:30': 148738,
'2021-01-01T08:00:00.00+05:30': 148994,
'2021-01-02T08:00:00.00+05:30': 149218,
'2021-01-03T08:00:00.00+05:30': 149435,
'2021-01-04T08:00:00.00+05:30': 149649,
'2021-01-05T08:00:00.00+05:30': 149850,
'2021-01-06T08:00:00.00+05:30': 150114,
'2021-01-07T08:00:00.00+05:30': 150336,
'2021-01-08T08:00:00.00+05:30': 150570,
'2021-01-09T08:00:00.00+05:30': 150798,
'2021-01-10T08:00:00.00+05:30': 150999,
'2021-01-11T08:00:00.00+05:30': 151160,
'2021-01-12T08:00:00.00+05:30': 151327,
'2021-01-13T08:00:00.00+05:30': 151529,
'2021-01-14T08:00:00.00+05:30': 151727,
'2021-01-15T08:00:00.00+05:30': 151918,
'2021-01-16T08:00:00.00+05:30': 152093,
'2021-01-17T08:00:00.00+05:30': 152274,
'2021-01-18T08:00:00.00+05:30': 152419,
'2021-01-19T08:00:00.00+05:30': 152556,
'2021-01-20T08:00:00.00+05:30': 152718,
'2021-01-21T08:00:00.00+05:30': 152869,
'2021-01-22T08:00:00.00+05:30': 153032,
'2021-01-23T08:00:00.00+05:30': 153184,
'2021-01-24T08:00:00.00+05:30': 153339,
'2021-01-25T08:00:00.00+05:30': 153470,
'2021-01-26T08:00:00.00+05:30': 153587,
'2021-01-27T08:00:00.00+05:30': 153724,
'2021-01-28T08:00:00.00+05:30': 153847,
'2021-01-29T08:00:00.00+05:30': 154010,
'2021-01-30T08:00:00.00+05:30': 154147,
'2021-01-31T08:00:00.00+05:30': 154274,
'2021-02-01T08:00:00.00+05:30': 154392,
'2021-02-02T08:00:00.00+05:30': 154486,
'2021-02-03T08:00:00.00+05:30': 154596,
'2021-02-04T08:00:00.00+05:30': 154703,
'2021-02-05T08:00:00.00+05:30': 154823,
'2021-02-06T08:00:00.00+05:30': 154918,
'2021-02-07T08:00:00.00+05:30': 154996,
'2021-02-08T08:00:00.00+05:30': 155080,
'2021-02-09T08:00:00.00+05:30': 155158,
'2021-02-10T08:00:00.00+05:30': 155252,
'2021-02-11T08:00:00.00+05:30': 155360,
'2021-02-12T08:00:00.00+05:30': 155447,
'2021-02-13T08:00:00.00+05:30': 155550,
'2021-02-14T08:00:00.00+05:30': 155642,
'2021-02-15T08:00:00.00+05:30': 155732,
'2021-02-16T08:00:00.00+05:30': 155813,
'2021-02-17T08:00:00.00+05:30': 155913,
'2021-02-18T08:00:00.00+05:30': 156014,
'2021-02-19T08:00:00.00+05:30': 156111,
'2021-02-20T08:00:00.00+05:30': 156212,
'2021-02-21T08:00:00.00+05:30': 156302,
'2021-02-22T08:00:00.00+05:30': 156385,
'2021-02-23T08:00:00.00+05:30': 156463,
'2021-02-24T08:00:00.00+05:30': 156567,
'2021-02-25T08:00:00.00+05:30': 156705,
'2021-02-26T08:00:00.00+05:30': 156825,
'2021-02-27T08:00:00.00+05:30': 156938,
'2021-02-28T08:00:00.00+05:30': 157051,
'2021-03-01T08:00:00.00+05:30': 157157,
'2021-03-02T08:00:00.00+05:30': 157248,
'2021-03-03T08:00:00.00+05:30': 157346,
'2021-03-04T08:00:00.00+05:30': 157435,
'2021-03-05T08:00:00.00+05:30': 157548,
'2021-03-06T08:00:00.00+05:30': 157656,
'2021-03-07T08:00:00.00+05:30': 157756,
'2021-03-08T08:00:00.00+05:30': 157853,
'2021-03-09T08:00:00.00+05:30': 157930,
'2021-03-10T08:00:00.00+05:30': 158063,
'2021-03-11T08:00:00.00+05:30': 158189,
'2021-03-12T08:00:00.00+05:30': 158306,
'2021-03-13T08:00:00.00+05:30': 158446,
'2021-03-14T08:00:00.00+05:30': 158607,
'2021-03-15T08:00:00.00+05:30': 158725,
'2021-03-16T08:00:00.00+05:30': 158856,
'2021-03-17T08:00:00.00+05:30': 159044,
'2021-03-18T08:00:00.00+05:30': 159216,
'2021-03-19T08:00:00.00+05:30': 159370,
'2021-03-20T08:00:00.00+05:30': 159558,
'2021-03-21T08:00:00.00+05:30': 159755,
'2021-03-22T08:00:00.00+05:30': 159967,
'2021-03-23T08:00:00.00+05:30': 160166,
'2021-03-24T08:00:00.00+05:30': 160441,
'2021-03-25T08:00:00.00+05:30': 160692,
'2021-03-26T08:00:00.00+05:30': 160949,
'2021-03-27T08:00:00.00+05:30': 161240,
'2021-03-28T08:00:00.00+05:30': 161552,
'2021-03-29T08:00:00.00+05:30': 161843,
'2021-03-30T08:00:00.00+05:30': 162114,
'2021-03-31T08:00:00.00+05:30': 162468,
'2021-04-01T08:00:00.00+05:30': 162927,
'2021-04-02T08:00:00.00+05:30': 163396,
'2021-04-03T08:00:00.00+05:30': 164110,
'2021-04-04T08:00:00.00+05:30': 164623,
'2021-04-05T08:00:00.00+05:30': 165101,
'2021-04-06T08:00:00.00+05:30': 165547,
'2021-04-07T08:00:00.00+05:30': 166177,
'2021-04-08T08:00:00.00+05:30': 166862,
'2021-04-09T08:00:00.00+05:30': 167642,
'2021-04-10T08:00:00.00+05:30': 168436,
'2021-04-11T08:00:00.00+05:30': 169275,
'2021-04-12T08:00:00.00+05:30': 170179,
'2021-04-13T08:00:00.00+05:30': 171058,
'2021-04-14T08:00:00.00+05:30': 172085,
'2021-04-15T08:00:00.00+05:30': 173123,
'2021-04-16T08:00:00.00+05:30': 174308,
'2021-04-17T08:00:00.00+05:30': 175649,
'2021-04-18T08:00:00.00+05:30': 177150,
'2021-04-19T08:00:00.00+05:30': 178769,
'2021-04-20T08:00:00.00+05:30': 180530,
'2021-04-21T08:00:00.00+05:30': 182553,
'2021-04-22T08:00:00.00+05:30': 184657,
'2021-04-23T08:00:00.00+05:30': 186920,
'2021-04-24T08:00:00.00+05:30': 189544,
'2021-04-25T08:00:00.00+05:30': 192311,
'2021-04-26T08:00:00.00+05:30': 195123,
'2021-04-27T08:00:00.00+05:30': 197894,
'2021-04-28T08:00:00.00+05:30': 201187,
'2021-04-29T08:00:00.00+05:30': 204832,
'2021-04-30T08:00:00.00+05:30': 208330,
'2021-05-01T08:00:00.00+05:30': 211853,
'2021-05-02T08:00:00.00+05:30': 215542,
'2021-05-03T08:00:00.00+05:30': 218959,
'2021-05-04T08:00:00.00+05:30': 222408,
'2021-05-05T08:00:00.00+05:30': 226188,
'2021-05-06T08:00:00.00+05:30': 230168,
'2021-05-07T08:00:00.00+05:30': 234083,
'2021-05-08T08:00:00.00+05:30': 238270,
'2021-05-09T08:00:00.00+05:30': 242362,
'2021-05-10T08:00:00.00+05:30': 246116,
'2021-05-11T08:00:00.00+05:30': 249992,
'2021-05-12T08:00:00.00+05:30': 254197,
'2021-05-13T08:00:00.00+05:30': 258317}
For a tutorial on how to deal with time series data for machine learning, see https://www.pluralsight.com/guides/machine-learning-for-time-series-data-in-python
Here I am adding several new variables to the data based on the datetimes originally in the data. I am doing this because it is possible that there are correlations between the number of COVID-19 related deaths and these new variables. For example, it might be the case that there are more people outside, and thus exposed to COVID-19, on weekends compared to weekdays, so this would be an important variable to keep track of.
df3 = pd.DataFrame({"Datetime": deaths.keys(), "Deaths": deaths.values()})
# Converts the datetime string into a datetime object. We can then easily access
# attributes of this object, such as year, month, day, etc.
def create_datetime_variables(df3):
df3['Datetime'] = pd.DatetimeIndex(df3['Datetime'])
df3['Year'] = df3['Datetime'].dt.year
df3['Month'] = df3['Datetime'].dt.month
df3['Day'] = df3['Datetime'].dt.day
df3['weekday'] = df3['Datetime'].dt.weekday
df3['Hours'] = df3['Datetime'].dt.hour
df3['Minutes'] = df3['Datetime'].dt.minute
df3['Seconds'] = df3['Datetime'].dt.second
return df3
df3 = create_datetime_variables(df3)
df3
| Datetime | Deaths | Year | Month | Day | weekday | Hours | Minutes | Seconds | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2020-01-30 13:33:00+05:30 | 0 | 2020 | 1 | 30 | 3 | 13 | 33 | 0 |
| 1 | 2020-02-02 10:39:00+05:30 | 0 | 2020 | 2 | 2 | 6 | 10 | 39 | 0 |
| 2 | 2020-02-03 12:13:00+05:30 | 0 | 2020 | 2 | 3 | 0 | 12 | 13 | 0 |
| 3 | 2020-03-02 14:28:00+05:30 | 0 | 2020 | 3 | 2 | 0 | 14 | 28 | 0 |
| 4 | 2020-03-03 19:36:00+05:30 | 0 | 2020 | 3 | 3 | 1 | 19 | 36 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 491 | 2021-05-09 08:00:00+05:30 | 242362 | 2021 | 5 | 9 | 6 | 8 | 0 | 0 |
| 492 | 2021-05-10 08:00:00+05:30 | 246116 | 2021 | 5 | 10 | 0 | 8 | 0 | 0 |
| 493 | 2021-05-11 08:00:00+05:30 | 249992 | 2021 | 5 | 11 | 1 | 8 | 0 | 0 |
| 494 | 2021-05-12 08:00:00+05:30 | 254197 | 2021 | 5 | 12 | 2 | 8 | 0 | 0 |
| 495 | 2021-05-13 08:00:00+05:30 | 258317 | 2021 | 5 | 13 | 3 | 8 | 0 | 0 |
496 rows × 9 columns
Looks like the seconds column has a lot of zeroes ... let's check for sure whether it is all zeroes. If it is, we can safely remove that piece of data.
df3.value_counts(subset='Seconds')
Seconds 0 496 dtype: int64
Now removing seconds from the dataframe, since all 496 values are 0 ...
df3.drop('Seconds', axis=1, inplace=True)
df3
| Datetime | Deaths | Year | Month | Day | weekday | Hours | Minutes | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2020-01-30 13:33:00+05:30 | 0 | 2020 | 1 | 30 | 3 | 13 | 33 |
| 1 | 2020-02-02 10:39:00+05:30 | 0 | 2020 | 2 | 2 | 6 | 10 | 39 |
| 2 | 2020-02-03 12:13:00+05:30 | 0 | 2020 | 2 | 3 | 0 | 12 | 13 |
| 3 | 2020-03-02 14:28:00+05:30 | 0 | 2020 | 3 | 2 | 0 | 14 | 28 |
| 4 | 2020-03-03 19:36:00+05:30 | 0 | 2020 | 3 | 3 | 1 | 19 | 36 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 491 | 2021-05-09 08:00:00+05:30 | 242362 | 2021 | 5 | 9 | 6 | 8 | 0 |
| 492 | 2021-05-10 08:00:00+05:30 | 246116 | 2021 | 5 | 10 | 0 | 8 | 0 |
| 493 | 2021-05-11 08:00:00+05:30 | 249992 | 2021 | 5 | 11 | 1 | 8 | 0 |
| 494 | 2021-05-12 08:00:00+05:30 | 254197 | 2021 | 5 | 12 | 2 | 8 | 0 |
| 495 | 2021-05-13 08:00:00+05:30 | 258317 | 2021 | 5 | 13 | 3 | 8 | 0 |
496 rows × 8 columns
Although year and month are numbers in the dataset, they are still considered categorical variables because numerical operations (such as adding two years) do not really make sense. For this reason, we need to convert these categorical variables into numeric values. We will do this by creating extra variables for each year and month.
Learn more about the get_dummies function from the documentation: https://pandas.pydata.org/docs/reference/api/pandas.get_dummies.html
# drop_first parameters removes the first column created by the categorical variable
# to avoid the dummy variable trap. For example, for year, we remove the year_2020 column.
df3 = pd.get_dummies(df3, columns=['Year'], drop_first=True, prefix='year')
df3 = pd.get_dummies(df3, columns=['Month'], drop_first=True, prefix='month')
df3 = pd.get_dummies(df3, columns=['weekday'], drop_first=True, prefix='wday')
df3
| Datetime | Deaths | Day | Hours | Minutes | year_2021 | month_2 | month_3 | month_4 | month_5 | ... | month_9 | month_10 | month_11 | month_12 | wday_1 | wday_2 | wday_3 | wday_4 | wday_5 | wday_6 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2020-01-30 13:33:00+05:30 | 0 | 30 | 13 | 33 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 1 | 2020-02-02 10:39:00+05:30 | 0 | 2 | 10 | 39 | 0 | 1 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 2 | 2020-02-03 12:13:00+05:30 | 0 | 3 | 12 | 13 | 0 | 1 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 2020-03-02 14:28:00+05:30 | 0 | 2 | 14 | 28 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 2020-03-03 19:36:00+05:30 | 0 | 3 | 19 | 36 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 491 | 2021-05-09 08:00:00+05:30 | 242362 | 9 | 8 | 0 | 1 | 0 | 0 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 492 | 2021-05-10 08:00:00+05:30 | 246116 | 10 | 8 | 0 | 1 | 0 | 0 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 493 | 2021-05-11 08:00:00+05:30 | 249992 | 11 | 8 | 0 | 1 | 0 | 0 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 494 | 2021-05-12 08:00:00+05:30 | 254197 | 12 | 8 | 0 | 1 | 0 | 0 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 495 | 2021-05-13 08:00:00+05:30 | 258317 | 13 | 8 | 0 | 1 | 0 | 0 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
496 rows × 23 columns
Now that we have created the necessary variables, it's time to start machine learning. I will start by dividing the data into training and test sets by an 80-20 split.
from sklearn.model_selection import train_test_split
import numpy as np
X = df3.iloc[:, 2:] # this gets all the columns of the data besides the Deaths column
y = np.array(df3['Deaths'].values) # df3['Deaths'] is a Series, we only want the values
X_train, X_test, y_train, y_test = train_test_split(
X, y, train_size=0.20, random_state=42)
Now that we have the data split appropriately, it's time to create a model. We will use a Decision Tree as our first basic model to see how well it performs.
from sklearn import model_selection
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error
Here we create the Decision Tree model and evaluate its performance using the root of the mean squared error (RMSE) and the r^2 correlation coefficient.
# This functions evaluates a model's performance on provided training and test datsets
# using the RMSE and r^2 statistics. The model is fitted to the data, then asked to
# make predictions on the training and test sets, which can then be used to compute
# the model's accuracy on the statistical measures.
def evaluate(model, X_train, y_train, X_test, y_test):
model.fit(X_train, y_train)
# Train
pred_train_tree= model.predict(X_train)
print(np.sqrt(mean_squared_error(y_train,pred_train_tree)))
print(r2_score(y_train, pred_train_tree))
# Test
pred_test_tree= model.predict(X_test)
print(np.sqrt(mean_squared_error(y_test,pred_test_tree)))
print(r2_score(y_test, pred_test_tree))
dtree = DecisionTreeRegressor(max_depth=10, min_samples_leaf=0.05, random_state=3)
evaluate(dtree, X_train, y_train, X_test, y_test)
20312.731090822366 0.9220945117143983 31115.479604373548 0.8108586299390869
It seems that the RMSEs of both testing and training are relatively low compared to the numbers in the dataset (which were in the ten millions), and the training and testing accuracies are pretty good! But let's see if we can do even better with a different model.
Here we are using a random forest model and evaluating its performance in the same way.
model_rf = RandomForestRegressor(n_estimators=2000, oob_score=True, random_state=100)
evaluate(model_rf, X_train, y_train, X_test, y_test)
1915.2780702096438 0.9993073799048235 3258.5495662973694 0.9979256492406713
We can see that here we are doing phenomally well, even better than with the Decision Tree. The RMSEs are somewhat lower than for the Decision Tree, and the accuracies in training and testing are much better. Let us see the number of deaths predicted for India overall now.
We will start by generating the dates required for the next 3 months.
# Creates a new datetime with the same hour/minute/second/microsecond information as
# startDate, but with a given year, month, and day.
def new_date(startDate, year, month, day):
return datetime.datetime(year, month, day, startDate.hour, startDate.minute, startDate.second, startDate.microsecond)
# Generates all the datetimes from now to the last datetime possible with values years, months, and days, while incrementing by year_step, month_step, and day_step.
def generate_datetimes(years, months, days, year_step, month_step, day_step):
current = datetime.datetime.now()
dates = [current]
for year in range(current.year, current.year + years, year_step):
for month in range(current.month, current.month + months, month_step):
for day in range(current.day, current.day + days, day_step):
if month < 12 and day < 28: # check whether month and day are valid
date = new_date(current, year, month, day)
dates.append(date)
return dates
dates = generate_datetimes(1, 3, 28, 1, 1, 5)
dates
[datetime.datetime(2021, 5, 14, 22, 21, 46, 852506), datetime.datetime(2021, 5, 14, 22, 21, 46, 852506), datetime.datetime(2021, 5, 19, 22, 21, 46, 852506), datetime.datetime(2021, 5, 24, 22, 21, 46, 852506), datetime.datetime(2021, 6, 14, 22, 21, 46, 852506), datetime.datetime(2021, 6, 19, 22, 21, 46, 852506), datetime.datetime(2021, 6, 24, 22, 21, 46, 852506), datetime.datetime(2021, 7, 14, 22, 21, 46, 852506), datetime.datetime(2021, 7, 19, 22, 21, 46, 852506), datetime.datetime(2021, 7, 24, 22, 21, 46, 852506)]
df4 = pd.DataFrame(dates)
df4.columns = ['Datetime']
df4 = create_datetime_variables(df4)
df4
| Datetime | Year | Month | Day | weekday | Hours | Minutes | Seconds | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2021-05-14 22:03:47.119701 | 2021 | 5 | 14 | 4 | 22 | 3 | 47 |
| 1 | 2021-05-14 22:03:47.119701 | 2021 | 5 | 14 | 4 | 22 | 3 | 47 |
| 2 | 2021-05-19 22:03:47.119701 | 2021 | 5 | 19 | 2 | 22 | 3 | 47 |
| 3 | 2021-05-24 22:03:47.119701 | 2021 | 5 | 24 | 0 | 22 | 3 | 47 |
| 4 | 2021-06-14 22:03:47.119701 | 2021 | 6 | 14 | 0 | 22 | 3 | 47 |
| 5 | 2021-06-19 22:03:47.119701 | 2021 | 6 | 19 | 5 | 22 | 3 | 47 |
| 6 | 2021-06-24 22:03:47.119701 | 2021 | 6 | 24 | 3 | 22 | 3 | 47 |
| 7 | 2021-07-14 22:03:47.119701 | 2021 | 7 | 14 | 2 | 22 | 3 | 47 |
| 8 | 2021-07-19 22:03:47.119701 | 2021 | 7 | 19 | 0 | 22 | 3 | 47 |
| 9 | 2021-07-24 22:03:47.119701 | 2021 | 7 | 24 | 5 | 22 | 3 | 47 |
df4 = pd.get_dummies(df4, columns=['Year'], prefix='year')
df4.iloc[:, 1:]
| Month | Day | weekday | Hours | Minutes | Seconds | year_2021 | |
|---|---|---|---|---|---|---|---|
| 0 | 5 | 14 | 4 | 22 | 3 | 47 | 1 |
| 1 | 5 | 14 | 4 | 22 | 3 | 47 | 1 |
| 2 | 5 | 19 | 2 | 22 | 3 | 47 | 1 |
| 3 | 5 | 24 | 0 | 22 | 3 | 47 | 1 |
| 4 | 6 | 14 | 0 | 22 | 3 | 47 | 1 |
| 5 | 6 | 19 | 5 | 22 | 3 | 47 | 1 |
| 6 | 6 | 24 | 3 | 22 | 3 | 47 | 1 |
| 7 | 7 | 14 | 2 | 22 | 3 | 47 | 1 |
| 8 | 7 | 19 | 0 | 22 | 3 | 47 | 1 |
| 9 | 7 | 24 | 5 | 22 | 3 | 47 | 1 |
empty_cols = ['month_2', 'month_3', 'month_4',
'month_5', 'month_6', 'month_7', 'month_8', 'month_9', 'month_10',
'month_11', 'month_12', 'wday_1', 'wday_2', 'wday_3', 'wday_4',
'wday_5', 'wday_6']
for col in empty_cols:
if col not in df4:
df4[col] = 0
df4.drop(columns=['Month', 'Seconds', 'weekday'], inplace=True)
df4.iloc[:, 1:]
| Day | Hours | Minutes | year_2021 | month_2 | month_3 | month_4 | month_5 | month_6 | month_7 | ... | month_9 | month_10 | month_11 | month_12 | wday_1 | wday_2 | wday_3 | wday_4 | wday_5 | wday_6 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 14 | 22 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 14 | 22 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 19 | 22 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 24 | 22 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 14 | 22 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | 19 | 22 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 6 | 24 | 22 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 7 | 14 | 22 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 8 | 19 | 22 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 9 | 24 | 22 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
10 rows × 21 columns
X
| Day | Hours | Minutes | year_2021 | month_2 | month_3 | month_4 | month_5 | month_6 | month_7 | ... | month_9 | month_10 | month_11 | month_12 | wday_1 | wday_2 | wday_3 | wday_4 | wday_5 | wday_6 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 30 | 13 | 33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 1 | 2 | 10 | 39 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 2 | 3 | 12 | 13 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 2 | 14 | 28 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 3 | 19 | 36 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 491 | 9 | 8 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 492 | 10 | 8 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 493 | 11 | 8 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 494 | 12 | 8 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 495 | 13 | 8 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
496 rows × 21 columns
df4.iloc[:, 1:].columns == X.columns
array([ True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True])
model_rf.predict(df4.iloc[:, 1:])
array([152507.395, 152507.395, 153601.792, 153675.184, 152507.395,
153601.792, 153675.184, 152507.395, 153601.792, 153675.184])
Our model shows that in 3 months of time, India's deaths are expected to overall decrease to around 153,000 per day. This is still a large number, but it is lower than the current death estimates.
Maharashta is the state with the highest number of cases and deaths.
Southern India seems to be more impacted by COVID-19 than Northern India, with Kerala as a notable exception.